Skip to content

Conversation

toksdotdev
Copy link

@toksdotdev toksdotdev commented Aug 7, 2018

Print out the difference between any two directories to stdout. This includes:

  • Difference between lines in files
  • Existence of a file

A sample output is:

╔════════════════════════════════════════════════════════════════════════╗
║                               Differences                              ║
╠════════════════════════╦═══════════════════════╦═══════════════════════╣
║ Filename               ║ tests/easy/bad/dir1   ║ tests/easy/bad/dir2   ║
╠════════════════════════╬═══════════════════════╬═══════════════════════╣
║ /sample.txt            ║ FILE EXISTS           ║ DOESN'T EXIST         ║
╠════════════════════════╬═══════════════════════╬═══════════════════════╣
║ "/test.txt":1          ║ testing testing       ║ oh no!                ║
╠════════════════════════╬═══════════════════════╬═══════════════════════╣
║ "/test.txt":2          ║ skdjjd                ║                       ║
╚════════════════════════╩═══════════════════════╩═══════════════════════╝
  • Tests created for any new feature or regression tests for bugfixes.
  • cargo test succeeds
  • cargo +nightly clippy succeeds

Copy link
Collaborator

@epage epage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for contributing!

A question I have though is if a view (e.g. table) belongs in dir-diff or only a model (e.g. iterator of results).

This is related to issue #11 which I experimented with in #17. I'm mixed on the results there and haven't had time to further experiment on it.

Additionally, I've been curious about integrating dir diffing into predicates so it can work with a testing library like assert_fs. See assert-rs/predicates-rs#33

@epage
Copy link
Collaborator

epage commented Aug 7, 2018

Also, could you share what your use case is? It'd be helpful to better understand the larger picture of what you are trying to accomplish.

Copy link
Collaborator

@epage epage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for responding to my first round of feedback. In this round, I cover some more design questions and trade offs. You are welcome to go ahead and work on them and we can continue to iterate on each round until we come to a design that works.

I will note that I've known people who tend to get burnt out with that though and I don't want that to happen with you. Something that can help is first discussing the requirements in #11 and assert-rs/predicates-rs#33 and proposing a solution in #11. We can then iterate in the comments of #11 on the concepts before taking the time to write the code.


[dependencies]
walkdir = "2.0.1"
matches = "0.1.7"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is only used for tests. This should be under dev-dependencies so that clients don't get this added to their builds.

For an example, see https://github.com/assert-rs/assert_cmd/blob/master/Cargo.toml

.collect::<Vec<_>>();

if files_a.len() != files_b.len() {
return Err(Error::MissingFiles);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting trade-off between performance and information.

For people who want fast results, this is great. For people who want to know what all is different, this doesn't work out too well.

Two things about this.

First, if we're trying to optimize, I'm unsure which half of this is slower, the part above here (walk two directory tries, put all content into Vec) or the part below (diffing content).

Second, I think it might be better to offer the client the choice.

We can handle both of these if we structured this as an iterator over two directories and then just offered them a diff function on the "tuple", they can then choose whether to filter for all differences, end on the first difference, etc. I'm assuming we can make the iterator by first walking one tree, checking for what exists in the other tree, and then walk the second tree, checking for what doesn't exist in the other tree.

This is how cobalt's diffing works in its tests

let full_path_a = &a_base.as_ref().join(&a);
let full_path_b = &b_base.as_ref().join(&b);

if full_path_a.is_dir() || full_path_b.is_dir() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if one is a directory and the other isn't? Won't this silently ignore that?


for (a, b) in files_a.into_iter().zip(files_b.into_iter()).into_iter() {
if a != b {
return Err(Error::FileNameMismatch(a, b));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is convenient for writing, I don't think users will understand what a file name mismatch means. Instead this is about one of the two files is missing in one of the trees. Ideally, we tell them that and tell them which one is missing.

let mut a_lines = content_of_a.lines().collect::<Vec<&str>>();
let mut b_lines = content_of_b.lines().collect::<Vec<&str>>();

if a_lines.len() != b_lines.len() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than writing our own file diffing, we should probably use the difference crate .

This is a re-phrasing of a previous posting that is now collapsed:

Depending on how we solve binary files, should this instead use the difference crate rather than re-implementing file diffing ourselves?


#[test]
fn missing_file() {
assert_matches!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat. I've never heard of this crate before. I'll need to see where it'd simplify my tests.

#[test]
fn missing_file() {
assert_matches!(
dir_diff::see_difference("tests/missing_file/dir1", "tests/missing_file/dir2"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For every test case, we should probably ensure see_difference returns the same result as is_different.

Ok(!a_walker.next().is_none() || !b_walker.next().is_none())
}

/// Identify the differences between two directories.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context: This is more for design input then to implemented right now

We are only offering directory diffing. Subset checks would also be very valuable, possibly more valuable at least when writing tests to keep the "golden" case simple to minimize expanding a test to cover more than is intended.

Later I mention

We can handle both of these if we structured this as an iterator over two directories and then just offered them a diff function on the "tuple", they can then choose whether to filter for all differences, end on the first difference, etc. I'm assuming we can make the iterator by first walking one tree, checking for what exists in the other tree, and then walk the second tree, checking for what doesn't exist in the other tree.

That makes it really easy for us to also offer a subset check, we only do the first iteration. One more benefit for the iteration approach.

//!
//! assert!(dir_diff::is_different("dir/a", "dir/b").unwrap());
//! ```
//!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the closing of the example code fence removed?

StripPrefix(std::path::StripPrefixError),
WalkDir(walkdir::Error),
/// One directory has more or less files than the other.
MissingFiles,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this for now to minimize churn as this PR evolves but I suspect we'll want to split out the diffing error information.

@toksdotdev
Copy link
Author

Sorry, I haven't had time to work on this, as I've been busy for the past months. I'll try to address this ASAP.

Sorry for the delay.

@epage
Copy link
Collaborator

epage commented Dec 8, 2018

Understandable; we all have those times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants